The LREC 2010 Resource Map

نویسندگان

  • Nicoletta Calzolari
  • Claudia Soria
  • Riccardo Del Gratta
  • Sara Goggi
  • Valeria Quochi
  • Irene Russo
  • Khalid Choukri
  • Joseph Mariani
  • Stelios Piperidis
چکیده

In this paper we present the LREC Map of Language Resources (data and tools), an innovative feature introduced in conjunction with the LREC 2010 Conference. The purpose of the Map is to shed light on the vast amount of resources that represent the background of the research presented at LREC, in the attempt to fill in a gap in the community knowledge about the resources that are used or created worldwide. It also aims at a change of culture in the field, actively engaging each researcher in the documentation task about resources. The Map has been developed on the basis of the information provided by LREC authors during the submission of papers to the LREC 2010 conference and the LREC workshops, and contains information about almost 2000 resources. The paper illustrates the motivation behind this initiative, its main characteristics, its relevance and future impact in the field, the metadata used to describe the resources, and finally presents some of the most relevant findings. 1. Why a Map of Language Resources? The purpose of this paper is to introduce the LREC Map of Language Resources (data and tools), an entirely new instrument that has been developed in the framework of the LREC2010 conference1. The term “map” suggests a representation of the salient characteristics of a given territory, thus enabling the knowledge and discovery of its main features. A map is drawn in order to make new territories known, or to improve the knowledge of already discovered ones. Why should the “territory” of Language Resources need a map? Several institutions worldwide maintain catalogues of language resources (ELRA2, LDC3, National Institute of Information and Communications Technology (NICT) Universal Catalogue4, ACL Data and Code Repository5, OLAC6, LT World7, etc). However, it has been estimated that only 10% of existing resources are known, either through distribution catalogues or via direct publicity by providers (web sites and the like). The rest remains hidden, the only occasions where it briefly emerges being when a resource is presented in the context of a research paper or report at some conference. Even in this case, nevertheless, it might be that a resource remains in the background simply because the focus of the research is not on the resource per se. Knowledge about existing resources is essential to the overall advancement of research in the field: it is important to be able to locate and retrieve the right resources for the right This work was partially funded by the FLaReNet Thematic Network (http://www.flarenet.eu). http://catalog.elra.info/ http://www.ldc.upenn.edu/Catalog/ http://facet.shachi.org/?ln=en http://www.aclweb.org http://www.language-archives.org/ http://www.lt-world.org/ applications, and to exploit existing ones before building new ones from scratch. Having a clear picture of which resources are available for which languages and for which use is important in order to identify existing gaps for certain languages at a given time and estimate the amount of investment needed to fill them in. Knowledge about the current use of resources is equally important. Knowing which resources are most used for the various applications will help to better understand the reason behind their success (their intrinsic quality, their wide availability, their licensing model, etc.). Knowing which standards are used in resource representation would help improve the development of standards themselves, by getting them more tuned to actual needs and requirements. Clear and easy-to-reach information of this type about resources and related technologies is lacking. At the same time, it is very important to stress that most resources are very poorly documented, or not documented at all, thus hindering their accessibility and in the end, their full deployment. We decided to exploit the unique opportunity offered by the LREC conference of gathering all major players of the sector in order to discover the resources directly or indirectly connected with the research presented at the conference. This felicitous conjunction of people and resources, we believe, will yield an unprecedented and comprehensive overview of the language resources currently being developed and used.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Road Map for Interoperable Language Resource Metadata

LRs remain expensive to create and thus rare relative to demand across languages and technology types. The accidental re-creation of an LR that already exists is a nearly unforgiveable waste of scarce resources that is unfortunately not so easy to avoid. The number of catalogs the HLT researcher must search, with their different formats, make it possible to overlook an existing resource. This p...

متن کامل

"LVF-lemon ― Towards a Linked Data Representation of ""Les Verbes français"""

In this study we elaborate a road map for the conversion of a traditional lexical syntactico-semantic resource for French into a linguistic linked open data (LLOD) model. Our approach uses current best-practices and the analyses of earlier similar undertakings (lemonUBY and PDEV-lemon) to tease out the most appropriate representation for our resource.

متن کامل

Language Resource Management System for Asian WordNet Collaboration and Its Web Service Application

This paper presents the language resource management system for the development and dissemination of Asian WordNet (AWN) and its web service application. We develop the platform to establish a network for the cross language WordNet development. Each node of the network is designed for maintaining the WordNet for a language. Via the table that maps between each language WordNet and the Princeton...

متن کامل

Language Technology Resource Center

This paper describes the Language Technology Resource Center (LTRC), a U.S. Government website for providing information and tools for users of languages (e.g., translators, analysts, etc.) The LTRC provides information on a broad range of products and tools, and provides a means for product developers and researchers to provide the U.S. Government and the public with information about their wo...

متن کامل

A Cross-Lingual Dictionary for English Wikipedia Concepts

We present a resource for automatically associating strings of text with English Wikipedia concepts. Our machinery is bi-directional, in the sense that it uses the same fundamental probabilistic methods to map strings to empirical distributions over Wikipedia articles as it does to map article URLs to distributions over short, language-independent strings of natural language text. For maximal i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010